AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Efficient vision-language model

# Efficient vision-language model

Omnigen2
Apache-2.0
OmniGen2 is a powerful and efficient unified multimodal model composed of a 3B vision-language model and a 4B diffusion model, supporting visual understanding, text-to-image generation, instruction-guided image editing, and context generation.
Text-to-Image
O
OmniGen2
136
5
Mobileclip B LT OpenCLIP
MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.
Text-to-Image
M
apple
774
9
Mobilevlm 1.7B
Apache-2.0
MobileVLM is a lightweight multi-modal vision-language model designed specifically for mobile devices, supporting efficient image understanding and text generation tasks.
Text-to-Image Transformers
M
mtgv
647
15
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase